Visual similarity analysis of Chinese characters and its uses in Japanese OCR
نویسندگان
چکیده
1!aditi~!lallY, ~_ Chin~se or J_ap~ese Optical Character Reader (OCR) has to representeach character category individually as one or more feature prototypes, or a structural description which is a composition of manually derived components such as radicals. Here we propose a new approach in which various kinds of visual similarities between different Chinese characters are analyzed automatically at the feature level. Using this method, character categories will be related to each other by training on fonts; and character images from a text page can be related to each other based on visual similarities they share. This method provides a way to interpret character images from a text page systematically, instead of a sequence of isolated character recognitions. The use of the method for postprocessing in Japanese text recognition will also be discussed.
منابع مشابه
A Survey of Telugu Ocr System
Optical character recognition is usually abbreviated as OCR. The object of OCR is automatic reading of optically sensed document text materials to translate human-readable characters into machine-readable codes. Today, reasonably efficient and inexpensive OCR packages are commercially available to recognize printed texts in widely used languages such as English, Chinese, and Japanese. These sys...
متن کاملScript Identification – A Han & Roman Script Perspective
All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese, Japanese, Korean, and Roman scripts, dem...
متن کاملTechniques for Highly Accurate Optical Recognition of Handwritten Characters and Their Application to Sixth Chinese National Population Census
Highly accurate optical character recognition (OCR) of handwritten characters is still a challenging task, especially for languages like Chinese and Japanese. To improve the accuracy, we developed four techniques for enhanced recognition: character recognition based on modified linear discriminant analysis (MLDA), subspace-based similar-character discrimination, multi-classifier combination, an...
متن کاملMobile Application for Recognition of Japanese Writing System
Abstrakt The objective of this work was to implement and compare various methods which can be used for optical character recognition (OCR) of characters used in the Japanese language and create a mobile application which could recognize characters in an image captured by the camera of a device and present the user with a translation of the words into English. The engine for recognition has been...
متن کاملOptical Character Recognition
Optical Character Recognition (OCR) is one of the challenging areas of pattern recognition. It gained popularity among the research community due to its vast application potentials. Extensive research has been done on OCR evidenced by a large number of research articles published in the literature during the last few decades. Most of the research works reported in this area are for Roman, Chine...
متن کامل